ROBPCA: A New Approach to Robust Principal Component Analysis

نویسندگان

  • Mia Hubert
  • Peter Rousseeuw
  • Karlien Vanden Branden
چکیده

In this paper we introduce a new method for robust principal component analysis. Classical PCA is based on the empirical covariance matrix of the data and hence it is highly sensitive to outlying observations. In the past, two robust approaches have been developed. The first is based on the eigenvectors of a robust scatter matrix such as the MCD or an S-estimator, and is limited to relatively low-dimensional data. The second approach is based on projection pursuit and can handle high-dimensional data. Here, we propose the ROBPCA approach which combines projection pursuit ideas with robust scatter matrix estimation. It yields more accurate estimates at noncontaminated data sets and more robust estimates at contaminated data. ROBPCA can be computed fast, and is able to detect exact fit situations. As a byproduct, ROBPCA produces a diagnostic plot which displays and classifies the outliers. The algorithm is applied to several data sets from chemometrics and engineering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust PCA and classification in biosciences

MOTIVATION Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good r...

متن کامل

Robust PCA for skewed data and its outlier map

The outlier sensitivity of classical principal component analysis (PCA) has spurred the development of robust techniques. Existing robust PCA methods like ROBPCA work best if the non-outlying data have an approximately symmetric distribution. When the original variables are skewed, too many points tend to be flagged as outlying. A robust PCA method is developed which is also suitable for skewed...

متن کامل

Robust classification in high dimensions based on the SIMCA Method

In this paper we first investigate the robustness of the SIMCA method for classifying high-dimensional observations. It turns out that both stages of the algorithm, the estimation of principal components and the construction of a classification rule, can be highly disturbed by the presence of outliers. Therefore we propose a robust procedure RSIMCA which is based on a robust Principal Component...

متن کامل

An application of principal component analysis and logistic regression to facilitate production scheduling decision support system: an automotive industry case

Production planning and control (PPC) systems have to deal with rising complexity and dynamics. The complexity of planning tasks is due to some existing multiple variables and dynamic factors derived from uncertainties surrounding the PPC. Although literatures on exact scheduling algorithms, simulation approaches, and heuristic methods are extensive in production planning, they seem to be ineff...

متن کامل

Using Surface-Enhanced Raman Scattering (SERS) and Fluorescence Spectroscopy for Screening Yeast Extracts, A Complex Component of Cell Culture

Yeastolate or yeast extract which are hydrolysates produced by autolysis of yeast, are often employed as a raw material in the media used for industrial mammalian cell culture. The source and quality of yeastolate can significantly affect cell growth and production, however, analysis of these complex biologically-derived materials is not straightforward. The best current method, liquid chromato...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Technometrics

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2005